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ABSTRACT 


Teaching in formal academic environments typically follows 
a curriculum that specifies learning objectives that need to 
be met at each phase of a student’s academic progression. In 
this paper, we address the novel task of identifying document 
segments in educational material that are relevant for differ- 
ent learning objectives. Using a dynamic programming algo- 
rithm based on a vector space representation of sentences in 
a document, we automatically segment and then label doc- 
ument segments with learning objectives. We demonstrate 
the effectiveness of our approach on a real-world education 
data set. We further demonstrate how our system is use- 
ful for related tasks of document passage retrieval and QA 
using a large publicly available dataset. To the best of our 
knowledge we are the first to attempt the task of segment- 
ing and labeling education materials with academic learning 
objectives. 


Keywords 
text segmentation, document labeling, academic learning 
objectives, unsupervised 


1. INTRODUCTION 


The rapid growth of cost-effective smart-phones and me- 
dia devices, coupled with technologies like Learning Content 
Management Systems, tutoring systems, digital classrooms, 
MOOC based eLearning systems etc. are changing the way 
today’s students are educated. A recent survey ' found that 
there was a 45% year-on-year uptake between 2013 and 2014 
of digital content in the classroom and a nearly 82% uptake 
in the use of digital textbooks. Of the 400,000 K-12 stu- 
dents surveyed, 37% of them reported using online textbooks 
for their learning needs. Students and teachers frequently 
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search for free and open education resources available online 
to augment or replace existing learning material. Organiza- 
tions like MERLOT? and the Open Education Consortium® 
offer and promote the use of free learning resources by index- 
ing material available on the web, based only on keywords 
or user specified meta-data. This makes the identification of 
the most relevant resources difficult and time consuming. In 
addition, the use of manually specified meta-data can also 
result in poor results due to inconsistent meta-data quality, 
consistency and coverage. Identifying materials most suit- 
able for a learner can be aided by tagging them with learning 
objectives from different curricula. However, manually la- 
beling material with learning objectives is not scalable since 
learning standards can contain tens of hundreds of objec- 
tives and are prone to frequent revision. Recent work by 
[3] attempted to address this problem by using external re- 
sources such as Wikipedia to expand the context of learning 
objectives and a tf-idf based vector representation of docu- 
ments and learning objectives. One of the limitations of the 
system is that it works well only when documents are rela- 
tively short in length and relate to a few learning standard 
objectives. The accuracy of the algorithm reduces when the 
documents considered are resources such as textbooks due 
to the dilution of the weights in the tf-idf based vector space 
model. Further, from the perspective of information access, 
returning a large reference book for a learning objective still 
burdens the user with the task of identifying the relevant 
portions of the book. This, therefore, does not adequately 
address the problem. 


In this paper, we address the problem of finding document 
segments most relevant to learning objectives, using docu- 
ment segmentation [1] and segment ranking. To the best of 
our knowledge, we are the first to attempt the problem of 
segmenting and labeling education materials with academic 
learning objectives. 


In summary, our paper makes the following contributions: 


e We define the novel task of identifying and labeling 
document segments with academic learning objectives. 


*http://www.merlot.org 
Shttp://www.oeconsortium.org/ 
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e We present the first system that identifies portions of 
text most relevant for a learning objective in large ed- 
ucational materials. We demonstrate the effectiveness 
of our approach on a real world education data set. We 
report a sentence level F'l score of 0.6 and a segment 
level minimal match accuracy@3 of 0.9 


e We demonstrate, using a large publicly available dataset, 
how our methods can also be used for other NLP tasks 
such as document passage retrieval and QA. 


The rest of the paper is organized as follows: In the next 
section we describe related work, in section 3 we formally 
describe our problem statement, section 4 describes our al- 
gorithm and implementation details and section 5 presents 
our detailed experiments. Finally, in section 6 we conclude 
this paper and discuss possible directions of future work. 


2. RELATED WORK 


Broadly, our work is related to three major areas of natural 
language research: Text Segmentation, Query Focused Sum- 
marization and Document Passage Retrieval. We present a 
comparison and discussion for each of these areas below: 


Text Segmentation: Typically, the problem of automat- 
ically chunking text into smaller meaningful units has been 
addressed by studying changes in vocabulary patterns [6] 
and building topic models[5]. In [12], the authors adapt the 
TextTiling algorithm from [6] to use topics instead of words. 
Most recently, [1] uses semantic word embeddings for the 
text segmentation task. While supervised approaches tend 
to perform better, we decided to adapt the state of the art 
unsupervised text segmentation method proposed in [1], due 
to the challenges associated with sourcing training data for 
supervised learning. 


Query Focused Summarization: Focused summariza- 
tion in our context [8], [10] [4] is the task of building sum- 
maries of learning materials based on learning objectives. 
Here, each learning objective can be treated as a query, and 
the learning materials as documents that need to be sum- 
marized. However, it is important to note that in the ed- 
ucation domain, any such summarization needs to ensure 
that summarized material is presented in a way that facil- 
itates learning. This poses additional research challenges 
such as automatically identifying relationships between con- 
cepts presented in the material and therefore, in this paper, 
we do not model our problem as a summarization task. We 
encourage the reader to consider it as a possible direction 
for future research. 


Document Passage Retrieval: Lastly, document pas- 
sage retrieval [2] is the task of fetching relevant document 
passages from a collection of documents based on a user 
query. However, such tasks typically require the passage 
boundaries to be well known and therefore, cannot return 
sub-portions that may be present within a passage or return 
results that span sub-parts of multiple passages. 


3. PROBLEM STATEMENT 


Typically, a learning standard consists of a hierarchical or- 
ganization of learning objectives where learning objectives 
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are grouped by Topic, Course, Subject and Grade. For the 
purpose of this paper we refer to a “label” as the complete 
Grade (g) -> Subject (s) -> Course (c) -> Topic (t) -> 
Learning Objective (1) path in the learning standard. 


Given a document D of length N we would like to iden- 
tify the most relevant segments of? Sob for a given label 
{g,s,c,t,l} where i,j denote positions in a document i.e 
i,j € [0, N] and i < j. In the rest of the paper, we denote 
the learning objective {g, s,c,t,l} as e to ease notation. 


Figure 1 shows chapter 2 from the the “College Physics” 
OpenStax textbook*. The segments (demarcated using rect- 
angles) have been identified for two learning objectives INST1 
and INST2 and occur in different portions of the book. They 
can even be a sub-part of an existing section in a chapter as 
shown for INST1. 


The next section describes our algorithm for the problem of 
segmentation and labeling based on learning objectives. 


4. OUR METHOD 


We represent each sentence as a unit vector s;, (0<i< N-— 
1) in a Dim dimensional space. The goal of segmentation is 
to find K splits in a document, denoted by (%0,21,...,2K), 
where xp = 0 and xx = N and 2; denotes the line number 
specifying the segment boundary such that if the kth seg- 
ment contains the sentence s;, then r,-1 <i < xx. The 
discovered segment ¢;,; is the segment between the splits 
x; and x; . Depending on the granularity of the learning 
objectives and the document collection, the optimal number 
of splits can be set (See section 5). Let the cost function 
for a segment (i, 7) measure the internal cohesion of the 
segment, (0 < i<j < N). The segmentation score for K 
splits s = (%0,21,...,@«) can then be defined as W : 


W(s) = ¥(x0, 21) + Y(@1, 42) +... + V(@K-1, 7K) 


To find the optimal splits in the document based on the 
cost function UV, we use dynamic programming. The cost of 
splitting U(N, K) is the cost of splitting 0 to N sentences 
using K splits. So, 


V(N, 1) = ¥(0,N) 
W(N,K)= min U(, 1 —1)4+¥v(,N) 
We define the w function as follows: 
bi.) = YO Ise — eG IDI? 
i<l<j 


where w(i,7) is analogous to the intra-cluster distance in 
traditional document clustering while p(i,7) is a represen- 
tative vector of the segment. We discuss possible forms of 
later in this section. 


Ranking: Each segment is represented as a normalized vec- 
tor (i, 7) and we determine the most relevant segments to a 
learning objective e by ranking segments in increasing order 
of similarity based on cosine similarity. 


Dim 


cos(,e) = S [a * €d 
d=1 


“https: //openstax.org/details/college-physics 
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‘Average Velocity 
Average velocity is displacement (change in position) divided by the time of wavel, 

ee eae (25) 
Uy 


2.4 Acceleration 


where is the average (indicated by the bar over the 1») Velocity, Ax is the change in position (or displacement), and x- 
and xp ate the final and beginning positions at times ty and ty, respectively. ifthe starting time 79 is taken to be zero, 
then the average velocity is simply 


5 Ay es) 


7 


fic tha ts definton ication tes ela is a voutor cae placement fa veers be magnate anderson 
‘The SI unit for velocity is meters per second or m/s, but many other units, such as km/h, mi/h (also written as mph), and cnvs, 
—— 8 are in common use. Suppose, for example, an airplane passenger took 5 seconds to move —4 m (the negative sign indicates thaf 
‘gue 22 A plane docolrae oso ow, af comeen or imnngin Maa. Re secowraonweppste in ocion ots Ye) (a pean nelson ccd ante 
sieve Canty Fo) 
pode 
: 


(27 


==4m__ 
= =f = - 08 ms. 


In everyday conversation, to accelerate means to speed up. The accelerator in a car can in fact cause it to speed up. The greatd 
the acceleration, the greater the change in velocity over a given time. The formal definition of acceleration is consistent with 
‘these notions, but mare inclusive. 


The minus sign indicates the average velocity is also toward the rear of the plane. 


The average velocity of an object does not tell us anything about what happens to it between the starting point and ending point, 
eV however. For example, we cannot tell from average velocity whether the airplane passenger stops momentarily or backs up 

before he goes to the back of the plane. To get more details, we must consider smaller segments of the trip over smaller tme 
‘Average Acceleration is the rate at which velocity changes, intervals, 


¢ Taken from Page: 37, Chapter 2, Kinematics 


3 Avie 
fees 7 
it traf 


Where a is average acceleration, 1’ is velocity, and f is time. (The har aver the a means average acceleration.) 


Because acceleration is velocity in m/s divided by time in s, the SI units for acceleration are m/s”. meters per second squared 
‘or meters per second per second, which literally means by how many meters per second the velocity changes every second. 
Recall that velocity is a vector—it has both magnitude and direction. This means that a change in velocity can be a change in 
‘magnitude (or speed), butt can also be a change in direction. For example, if a car turns a comer al constant speed, itis 
accelerating because its ditection is changing. The quicker you tum, the greater the acceleration. So there is an acceleration 
when velocity changes either in magnitude (an increase or decrease in speed) or in ditection, or both. 


INST2 : CALCULATE AVERAGE VELOCITY , INSTANTANEOUS 
VELOCITY AND ACCELERATION IN A GIVEN FRAME OF 
REFERENCE 


Taken from Page: 40, Chapter 2, Kinematics 


In the real world, air resistance can cause a lighter object to fall slower than a heavier object of the same size. A tennis ball will 
reach the ground after a hard baseball dropped at the same time. (It might be difficult to observe the difference if the height is not 
large.) Air resistance opposes the motion of an object through the air, while friction between objects—such as between clothes 
and a laundry chute or between a stone and a poo! into which itis dropped—also opposes motion hetween them. For the ideal 
situations of these first few chapters, an object falling without air resistance or friction is defined to be in free-fall 

“The force of gravity causes objects to fall toward the center of Earth, The acceleration of free-falling objects is therefore called 
the acceleration due to gravity. The acceleration due to gravity is constant, which means we can apply the kinematics 
equations to any falling object where air resistance and friction are negligible. This opens a broad class of interesting situations 
tous. The acceleration due to gravity is so important that its magnitude ts given its awn symbol, 2. Its constant at any given 


location on Earth and has the average value 


Taken from Page: 61, 
Chapter 2, Kinematics 


g = 9.80 m/s”, e714) 


Figure 1: This image shows excerpts from chapter 2 Kinematics from the College Physics text book by OpenStax along 
with the segment boundaries for two learning objectives INST1 and INST2 shown in colors red and green respectively. 


We then select the top n ranked segments as the segments 
relevant to the learning objective. In section 5.3 we describe 
how the number of splits K as well as the value of n can be 
chosen empirically given a validation data set. 


We now describe different methods of constructing the doc- 
ument and segment vectors: 


TF-IDF: Each sentence is represented as a bag of words, 
the dimensionality being the vocabulary size. Each word in 
a sentence v; is weighted by its tfidf measure. For a word 
v; in the sentence s; of a document D, the tfidf measure is 
given by : 


D 
tfidf(vi)s,,.D = f (vi, P) log (425) 
where f(vi,d) is the frequency of the word v; in the doc- 
ument d, |D| being the total number of documents in our 
corpus and df(v:) is the number of documents with the word 
vi in it. The segment vector ju(i, 7) in this case is the mean 
of the sentence vectors in that segment. 


Word Vector: We represented each sentence as a weighted 
combination of the word vectors in a sentence. The word- 


vector w; for each word ¥, is specified using Mikolov’s Word2Vec{[9]... 


Each sentence s; is represented as: 


= Ded los (7) ii 


The segment vector p(i,7) is also the mean vector in this 
case. 


Fisher Vector: Paragraph vectors[7] try to embed the 
sentences in a fixed dimension, but they require extensive 
training on the source dataset. Instead we use Fisher Vec- 
tors, which have been widely used in the vision commu- 
nity [11] for combining different feature vectors (word vec- 
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tors in our case), and were recently used for question re- 
trieval by Zhou et.al. [15]. The word vocabulary is modeled 
as a Gaussian Mixture Model, since a GMM can approxi- 
mate any continuous arbitrary probability density function. 
Let A = {0;, uj, 45,7 = 1... Na} be the parameters of the 
GMM with Ng gaussians. Let, {wi, w2,...,wr} be the vec- 
tors for the words v1, v2,...,u7r in the sentence s; for which 
we need to construct the fisher vector. We define 7; (wz) to 
be the probability that the word w; is assigned the gaussian 
Js 
95.N (welty, &5) 
ak Ou (wel tu, Eu) 


We define the gradient vector as the score for a sentence, 
G)(s;) [13]. To compare two sentences, Fisher Kernel is 
applied on these gradients, 


K(si,8j) = @a(si) Fx 'Ga(sz) 
where, F is the Fisher Information Matrix, 


Fy = Epap(a|a)[Ga(si)Ga(sj)"] 


yj (we) = 


Fy 1 can be decomposed as LX L , hence the Fisher Ker- 
nel can be decomposed to two normalized vectors, [',(s;) = 
L)G)(s;) . This P)(s;) is the fisher vector for the sentence 


a 


a ee 


The final fisher vector is the concatenation of all T’,4(si) 
a3 
and T',a(si) for all 7 = 1...Na, d= 1... Dim, hence 2 « 
J 


Nag* Dim dimensional vector. We define the segment vector 
(i,j) as the fisher vector formed by using the word vectors 
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in the segment, hence giving us a unified representation of 
the segment. 


5. EXPERIMENTS 


In this section we evaluate our method for identifying doc- 
ument segments suited for learning objectives. 


5.1 Data 


We made use of two data sets for our experiments: 


AKS labeled Data Set: We use the collection of 110 
Science documents used by [3] labeled with 68 learning ob- 
jectives from the Academic Knowledge and Skills (AKS)’. 
We also used term expansions as described in [3] to increase 
the context of learning objectives. We further identified doc- 
ument segments (at the sentence level) suitable for the learn- 
ing standard in each of the documents, where applicable. 


To build a collection of documents covering multiple learn- 
ing objectives, we simulated the creation of large academic 
documents such as text books, by augmenting each lecture 
note with 9 randomly selected lecture notes. Thus, for each 
of the 68 instructions that were covered in our data set, we 
created 5 larger documents each consisting of 10 documents 
from the original set, giving us a document collection of 340 
large documents, with an average length of 300 sentences. 


Dataset #Docs #Avg. Sentences #Avg. Splits 
AKS Dataset 340 300 10 
WikiQA 8100 180 10 


WikiQA Dataset: To show the general applicability of our 
approach on tasks such as document passage retrieval and 
QA, we also use the recently released WikiQA data set [14] 
which consists of 3047 questions sampled from Bing® query 
logs and associated with answers in a Wikipedia summary 
paragraph. As outlined in the approach above, for each of 
the questions, we created a larger document by including 
9 other randomly selected answer passages. For each of the 
2700 questions from the Train and Test collection we created 
3 such documents, thus giving us 8100 documents. 


5.2 Evaluation Metrics 
We define the following metrics for our evaluation: 


MRR (Mean Reciprocal Rank) : The MRR is defined 
as the reciprocal rank of the of the first correct result in a 
ranked list of candidate results. 


P@N (Precision@N): Let the set of sentences in the top 
N segments identified be [°¥* and further, let the set of 


sentences in the gold standard be ree. The precision@N 
is given by : 
revs nN ree 
PQ@N = [Ps] (3) 


https: //publish.gwinnett.k12.ga.us/gcps/home/public/ 
parents/content /general-info/aks 
Shttp://www.bing.com 


R@N (Recall@N):Using the same notation described above, 
the recall @ N is given by : 


\rsys al peed 


RON = ——eeiaj (4) 


F1@N (F1 Score @N): The F1 Score@N is given by the 
harmonic mean of the Precision@N and Recall@N described 
above. MMA@N (Minimal Match Accuracy@N) For 
a collection of D labeled documents, the minimal match 
accuracy@QN is a relaxed value of precision and is given by: 


Se, Tea} 
D 


where 1{} is the indicator function. 


(5) 


5.3. Experimental Setup 

For the AKS dataset, we calculate the idf using a collection 
of 6000 Science documents from Wikibooks” and Project 
Gutenburg®. For the WikiQA dataset, idf was calculated on 
the 2700 summaries in the training and test collection. Word 
vectors and fisher vectors were trained on the full collection 
of English Wikipedia articles to ensure that the Gaussian 
Mixture model isn’t trained on a skewed dataset and can be 
used across universally for all kinds of english educational 
documents. The number of gaussians were selected based 
on the bayesian information criterion.® 


Choosing the number of top segments: The number 
of top ranked segments n and the number of splits K both 
affect the accuracy of the system. For instance, if we set 
K to be half the total number of sentences, the resulting 
segments will be very small. Therefore, the value of n needs 
to be higher to have adequate coverage (recall). Similarly, 
choosing very few splits can result in large chunks, which 
can be problematic if the learning objectives were precise 
and required finer segments. Thus, the choice of n and Kk 
depends on the granularity of specification in the learning 
objectives as well as the nature of content in the document 
collection. 


We use 20% of the dataset (selected at random) as the val- 
idation set for tuning the parameters n and k. By varying 
both n and K we can determine the value at which the sys- 
tem performance (measured using F1 score) is best. Figure 
2 shows the variation in F1 Score for different values of K 
and n. For clarity of presentation, we only show this for the 
system using TF-IDF vectors. As can be seen, the F'l score 
is best for 10 splits and choosing the 3 best segments closest 
to the learning objective i.e K = 10,n = 3. Figures 3 and 4 
show the individual contributions to the F'l score. 


5.4 Results 


5.4.1 Document Segmentation and Labeling 

On performing segmentation on the AKS dataset using all 
three vector approaches, we observe (table 1) that the tf 
idf vector representation works best. We noticed that many 


“http: //www.wikibooks.org 
Shttp://www.gutenburg.org 


9 An index used for model selection —2Lm, +mlnn, where Lm 
is the maximized likelihood, m are the number of parameters 
and n is the sample size 
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@1 @3 @5 
Query Expansion P R F1 P R F1 P R F1 


TFIDF 0.669 0.359 0.468 0.493 0.698 0.578 0.395 0.843 0.538 
WORDVEC 0.462) 0.357 (0.403 0.331 =—(0.633 (0.434 0.284 0.829 0.423 
FISHER 0.476 0.366 0.414 0.342 0.679 0.454 0.284 0.855 0.426 


TFIDF 0.686 0.320 0.436 0.545 0.701 0.613 0.435 0.856 0.577 
WORDVEC 0.483 (0.323 (0.387) «0.351 (0.586 (0.439--(0.808)=—-0.797 =~ (0.444 
FISHER 0.481 0.322 0.386 0.351 0.619 0.448 0.3805 0.827 0.445 


No Expansion 


With Expansion 


Table 1: Results on the AKS Labeled Dataset 


MRR MMA@1 MMA@3 MMAQ@5 


TFIDF 0.78 0.652 0.905 0.882 
WORDVEC 0.56 0.429 0.635 0.782 
FISHER 0.55 0.405 0.620 0.715 


Table 2: Segment Level Results on AKS Dataset 


ab 


F1 SCORE 


#SPLITS 


Figure 2: F1 Variation with number of segments at 
varying depths of retrieval. Best score at 10 segments 
at depth 3 


of the documents in the AKS data set were very well con- 
textualized when changing topics, thus blurring the segment 
boundaries. For example, in one of the documents which de- 
scribed “Motion in a Straight Line”, the concepts of “veloc- 
ity”, “acceleration”, “position-time” graphs are intertwined 
and the topical drift is not easy to observe. As a result, due 
to the nature of documents in the collection, we hypothesize 
that the fisher vectors and word vectors which have been 
trained on large general corpora are unable to adequately 
distinguish some portions of the text, while the tf-idf vec- 
tors which have been tuned on the corpus better reflect the 
word distributions. 


The precision, recall and F1 scores are calculated at the 
sentence level, thus making it a very strict measure. So we 
also report segment level accuracy, i.e. how many of the top 
nm segments identified were relevant. A predicted segment 
is labeled relevant to the external query if at least 70% of 
the segment overlaps with the gold labeled segments. We 
evaluate the performance using MRR and MMA@QN. Table 
2 shows the segment level evaluation of our system. 


5.4.2 Passage Retrieval and QA 

We also conducted experiments with a more discriminative 
dataset where the topical shift is not as hard to observe. We 
report (table 3) an MRR of 0.895 and P@1 of 89.4% for the 
passage retrieval task on each of the documents generated, 
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PRECISION 


#SPLITS 


Figure 3: Precision variation with number of segments 
at varying depths of retrieval. Low values of n and high 
values of K give high precision. Increasing K while keep- 
ing n constant gives a drop in precision. 


RECALL 


#SPLITS 


Figure 4: Recall variation with number of segments at 
varying depths of retrieval. Recall is higher at low values 
of kK and high values of n, and the recall drops consider- 
ably as the number of segments K increases. 


as described in section 5.1. 


Further, we also describe our results on the original task, 
proposed with the data set, of finding the answer in a pas- 
sage for a question. In our experiments we report results 
under two conditions: (a) First identifying the best passage 
and then choosing the best sentence (b) Assuming the best 
passage is already known and then choosing the best sen- 
tence that answers the query (original WikiQA QA task). 
Table 4 presents results of experiments under both these 
conditions. It can be seen that our system gives comparable 
results under both conditions. The state of the art results 
under condition (b) as reported in the original paper is an 
MRR of 0.696. Our system, though not designed for the 
original task, has an MRR score 10% lower than the best 
system reported. 
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@1 @3 


MRR MMA@1 MMAQ@3 P R F1 P R F1 
TFIDF 0.807 0.797 0.812 0.8389 0.893 0.865 0.308 0.958 0.466 
WORDVEC _ 0.895 0.877 0.913 0.894 0.914 0.904 0.315 0.984 0.478 
FISHER 0.865 0.842 0.887 0.863 0.885 0.874 0.298 0.975 0.457 
Table 3: WikiQA Passage Retrieval Results 
MRR MRR educational content with academic learning standards. 
Top Segment Gold Standard Passage In Proceedings of the 2015 SIAM International 
TFIDF 0.528 0.495 Conference on Data Mining, Vancouver, BC, Canada, 
WORDVEC 0.548 0.586 April 30 - May 2, 2015, pages 136-144, 2015. 
FISHER 0.577 0.597 [4] H. Daumé II and D. Marcu. Bayesian query-focused 
summarization. In Proceedings of the 21st 
Table 4: Finding the sentence answering the question: International Conference on Computational 
“Top segment” uses our system to select the best passage Linguistics and the 44th annual meeting of the 
while “Gold standard passage” uses the actual passage Association for Computational Linguistics, pages 
labeled in the data set 305-312. Association for Computational Linguistics, 
2006. 
6. DISCUSSION AND CONCLUSION 5] L. Du, J. K. Pate, and M. Johnson. Topic 
In this paper we described the novel task of automatically segmentation in an ordering-based topic model. 2015. 
segmenting and labeling documents with learning standard 6] M. A. Hearst. Texttiling: A quantitative approach to 
objectives. Using a state of the art dynamic programming discourse segmentation. Technical report, Citeseer, 
algorithm for text segmentation, we demonstrate its use for 1993. 
this problem and report a sentence level F'1 score of 0.613 7] Q. V. Le and T. Mikolov. Distributed representations 
and segment level MM A@3 of 0.9. We also demonstrated of sentences and documents. arXiv preprint 
the effectiveness of our approach on document passage re- arXtv:1405.4058, 2014. 
trieval and QA tasks. 8] J.-P. Mei and L. Chen. Sumcr: a new subtopic-based 
extractive approach for text summarization. 
Our method is completely unsupervised and only requires a Knowledge and information systems, 31(3):527-545, 
small validation set for parameter tuning. This makes our 2012. 
work general and easily applicable across different geogra- 9] T. Mikolov, K. Chen, G. Corrado, and J. Dean. 
phies and learning standards. Identifying document seg- word2vec, 2014. 
ments best suited for learning objectives is a challenging 10) Y. Ouyang, W. Li, S. Li, and Q. Lu. Applying 
problem. For instance, portions of documents that intro- regression models to query-focused multi-document 
duce or summarize topics or build a background in an area summarization. Information Processing & 
are very hard to disambiguate for the algorithm due to the Management, 47(2):227-237, 2011. 
lack of observable topic shifts. Developing more sophisti- 11] F. Perronnin, J. SAnchez, and T. Mensink. Improving 
cated cohesion and topical diversity measures to address this the fisher kernel for large-scale image classification. In 
problem could be an interesting direction of further research. Computer Vision-ECCV 2010, pages 143-156. 
: Springer, 2010. 
In future work, we would also like to explore methods that 12] M. Riedl and C. Biemann. Text segmentation with 
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